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Abstract 

This paper defines a new transport metric over the space of non-negative 
measures. This metric interpolates between the quadratic Wasserstein and 
the Fisher-Rao metrics and generalizes optimal transport to measures with 
different masses. It is defined as a generalization of the dynamical for¬ 
mulation of optimal transport of Benamou and Brenier, by introducing a 
source term in the continuity equation. The influence of this source term 
is measured using the Fisher-Rao metric, and is averaged with the trans¬ 
portation term. This gives rise to a convex variational problem defining our 
metric. Our first contribution is a proof of the existence of geodesics (i.e. 
solutions to this variational problem). We then show that (generalized) 
optimal transport and Fisher-Rao metrics are obtained as limiting cases of 
our metric. Our last theoretical contribution is a proof that geodesics be¬ 
tween mixtures of sufficiently close Diracs are made of translating mixtures 
of Diracs. Lastly, we propose a numerical scheme making use of first order 
proximal splitting methods and we show an application of this new distance 
to image interpolation. 


1 Introduction 

Optimal transport is an optimization problem which gives rise to a popular class 
of metrics between probability distributions. We refer to the monograph of Vil- 
lani [ {ilO ] for a detailed overview of optimal transport. We now describe in 
an informal way various formulations of transportation-like distances, and ex¬ 
pose also our new metric. A mathematically rigorous treatment can be found in 
Section 2. 

1.1 Static and Dynamic Optimal Transport 

In its original formulation by Monge, optimal transport takes into account a 
ground cost that measures the amount of work needed to transfer a measure 
towards another one. This formulation was later relaxed by Kantorovich [ Tan42] 
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as a convex linear program, often referred to as a “static formulation”. Given a 
ground cost c : fl 2 i—M, C M. d and two probability measures po and p\ on fl, 
the central problem is that of minimizing 


/ • 

JVtxQ 


c(x,y)d'y(x,y) 


( 1 ) 


over the set r( P0jPl ) of coupling measures, that is non-negative measures 7 on 
fl 2 admitting po and pi as first and second marginals. More explicitly, for every 
measurable set A E fl, 7 (A, fl) = po(A) and 7 (fl, A) = pi(A). When the ground 
cost is the distance |x — p| p , the minimum induces a metric on the space of 
probability measures which is often referred to as the p-Wasserstein distance, 
denoted by W p . 

In a seminal paper, Benamou and Brenier [BBOO] showed that this static 
formulation is equivalent to a “dynamical” formulation, where one seeks for a 
“geodesic” p(£, x) that evolves in time between the two densities. More precisely, 
they showed that the squared 2-Wasserstein is obtained by minimizing the kinetic 
energy 

r 1 r 

\v(t, x)\ 2 p{t, x)dxdt 



0 JQ 


where p interpolates between po and pi and v is a velocity field which advects the 
mass distribution p. Introducing the key change of variables (p,v) 1 —(p, cj) — 
(p, pv), this amounts to minimizing the convex functional 



\u(t,x)f 


dxdt 


/o Jn 

over the couples (p, cj) satisfying the continuity equation 


( 2 ) 


dtp+V -u = 0, p(0,-) = po, p(l,-) = p 1 . (3) 


In particular, their paper emphasized the Riemannian interpretation of this met¬ 
ric and paved the way for efficient numerical solvers. 

The geometric nature of optimal transport makes this tool interesting to per¬ 
form shape interpolation, especially in medical image analysis [ 2TA04]. Even 
though the resulting interpolation is somewhat simplistic compared to methods 
based on diffeomorphic flows such as LDDMM [ MTY05], the fact that the prob¬ 
lem is convex is interesting for numerical purposes and large scale applications. 
A major restriction of optimal transport is however that it is only defined be¬ 
tween measures having the same mass. This might be an issue for applications in 
medical imaging, where measures need not be normalized, and biophysical phe¬ 
nomena at stake typically involve some sort of mass creation or destruction. To 
date, several generalizations of optimal transport have been proposed, with var¬ 
ious goals in mind (see Section 1.3). However, there is currently no formulation 
for which geodesics can be interpreted as a joint displacement and modification 
of mass. In this article, we propose to tackle this issue by interpolating between 
the Wasserstein and the Fisher-Rao metric, which can be achieved using a convex 
dynamical formulation. 
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In the remainder of this section, we introduce our model and describe related 
work which generalizes optimal transport to unbalanced measures. In addition, 
we introduce in an informal manner a family of distances which shows the similar¬ 
ity between a priori unrelated models of transport between unbalanced measures. 
In Section 2, we prove the existence of geodesics and exhibit optimality condi¬ 
tions. Section 3 is devoted to the study of the effect of varying the interpolation 
factor and to the limit models. We detail in Section 4 explicit geodesics between 
atomic measures satisfying suitable properties. Finally, we describe a numerical 
scheme in Section 5 and use the distance for image interpolation experiments. 

1.2 Presentation of the Model and Contributions 

The dynamical formulation (2) suggests a natural way to relax the equality of 
mass constraint by introducing a source term £ in the continuity equation (3) as 
follows 


d t p + 'V-u = (, p(0, •) = po, p(l,-) = Pi, 


and adding a term penalizing £ in the Benamou-Brenier functional, in the spirit 
of metamorphoses introduced in imaging in [ CY05]. We think it is natural to 
require the following two properties for the penalization on £: 

1. Reparametrization invariance. As the geometrical aspect of the problem is 
taken care of by the optimal transport part, it is important for the penalty 
defined on £ not to be sensitive to deformations of the objects. 

2. Relation with a Riemannian metric. Riemannian metrics are natural for 
interpolation purposes and for applications to imaging. Moreover, the stan¬ 
dard notion of curvature is available in a Riemannian setting and give some 
important informations on the underlying geometry. 

It has been shown recently that the only Riemannian metric which is invariant by 
reparametrization on the space of smooth positive densities on a closed manifold 
of dimension greater than one is the Fisher-Rao metric [ dBMR]. It was intro¬ 
duced in 1945 by C. R. Rao, who applied for the first time differential geometry 
in the field of statistics [ tao4!:]. The squared Fisher-Rao distance between two 
smooth positive densities po and pi is defined as the minimum of 



over the couples (p, £) such that dtp = £, p(0, •) = po and p(l, •) = p\. Its 


geodesics are explicit and given by p(t,rr) = t\Jp\(x) + (1 — t)^po(x) . Note 


that this metric can be rigorously defined on the space of non-negative measures 
(see Definition 6), but it is not Riemannian anymore. 

The considerations above lead us to the problem of minimizing 
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under the constraints 


d t p + V-u = C, p(0,-) = po, p{l,-) = Pi, (5) 


where 5 > 0 is a parameter. The metric defined by this infimum as well as the 
minimizers are the central objects of this paper. While formula (4) only makes 
sense for positive smooth densities, we rigorously extend this definition on the 
space of non-negative measures in Section 2. This extension is well behaved 
since the functional that is minimized is convex and positively 1-homogeneous, 
two key properties which allow to extend it readily as a lower semi-continuous 
(l.s.c.) functional over measures [ >B90]. 

Let us now give a physical interpretation of this model: just as the first term 
in the functional measures the kinetic energy of the interpolation |M|| 2 ( p ) where 
v is the velocity field, the second term measures an energy of growth ||g||^ 2 ( p ) 
where g denotes the rate of growth £/p which is a scalar field depending on t and 
x. In this way, our problem can be rewritten as the minimization of 


1 

2 



JQ 


\v(t, x)| 2 p(t, x)dx + S 2 / \g(t, x)| 2 p(t, x)dx 

Jn 


d t 


under the constraints 


d t p = gp-V ■ (pv), p(o,-) = p 0 , p(l,-) = pi. 

This formulation emphasizes the geometric nature of this model. 


1.3 Previous Works and Connections 

The idea of extending optimal transport to unbalanced measures is far from new, 
and is already suggested by Kantorovitch in 1942 (see [3ui02]) where he allows 
mass to be thrown away out of the (compact) domain. The geometry of the do¬ 
main also plays a role in the more recent models of [ G10] where the boundary 
of the domain is taken as an infinite reserve of mass. In the past few years, some 
authors combined generalizations of optimal transport with numerical algorithms 
in order to tackle practical applications. In [ >en03], the problem is formulated 
as an optimal control one where the matching term is the L 2 distance between 
densities. The dynamic problem remains convex and is solved numerically with 
an augmented Lagrangian algorithm. A quadratic penalization is also suggested 
in [ IRSS15] but based, this time, on the dynamic formulation, with the model 
of metamorphosis is mind: they use constraint (5) and add the term ff ( 2 dxdt in 
the Benamou-Brenier functional (along with a viscous dissipation term). From 
the perspective of computing image interpolations [ Ml 3] suggest, instead of pe¬ 
nalizing ( in the dynamic functional, to fix it as a function of p, according to 
some a priori on the growth model. Then, if the a priori is well chosen, mini¬ 
mizing the Benamou-Brenier functional gives rise to meaningful interpolations. 
Note that one of our limit models (see Definition 5) falls into this category. 

Now we introduce with more details the classes of distances introduced in 
[ 3 R14, PR13], along with partial optimal transport distances, as they share some 
similarities with our approach. 


4 




Partial transport and its dynamical formulation. Partial optimal trans¬ 
port, introduced in [ 110 ] consists in transferring a fixed amount of mass be¬ 

tween two measures po and pi as cheaply as possible. Letting r<(p 0 ,p 1 ) be the set 
of measures on x ft whose left and right marginals are dominated by po and pi 
respectively, this amounts to minimizing ( 1 ) under the constraints 7 E r<( po pi ) 
and 17|(J1 x Q) = m. What happens is rather intuitive: the plan is divided 
between an active set, where classical optimal transport is performed, and an 
inactive set. A theoretical study of this model is done in [CM 10 ] and [ 4gl ], 
where for instance results on the regularity of the boundary between the active 
an inactive sets are proved. For the quadratic cost, the Lagrangian formulation 
for this problem [CM10, Corollary 2 . 11 ] reads 

inf f (\x - y\ 2 - A) y ). ( 6 ) 

7^1 <(po,pi) Jnxn 

where choosing the Lagrange multiplier A amounts to choosing the maximum 
distance mass can be transported—which is y/X —and consequently, the amount 
m of mass transported. 

Independently, a class of distances interpolating between total variation and 
the W p metrics has been introduced by [ R14, PR13], taking their inspiration 
from the optimal control formulation of [ 30 ]. This distances are defined as 

the infimum of 

W p (po, pi) + 5 • (|po — po| + |Pi — Pi I) 

over po, pi, non-negative measures on Q, for p > 1 . Now remark that, for fixed po 
and pi and up to changing 5, the previous problem is equivalent to minimizing 

Wp(po, Pi) + SP ■ (Ipo - Pol + \pi ~ pi\) (7) 

To our knowledge, the equivalence between this model and partial optimal trans¬ 
port has not been exposed yet although this allows an interesting dynamical 
formulation for the well studied partial transport problem. This is the object of 
the following proposition. 

Proposition 1.1 (Equivalence between partial OT and Piccoli-Rossi distances). 
When choosing 2 S 2 = A and p = 2, problems ( 6 ) and (7) are equivalent. 

Proof. First, it is clear that problem (7) is left unchanged when adding the dom¬ 
ination constraints po < po and pi < pi ([ R13, Proposition 4]). This implies 
that | pi — pi | is equal to pi(fl) — fii(fl) in such a way that (7) can be rewritten as 

inf f (\x -y\ 2 ) dj(x,y) + 5 2 (p 0 (f2) + pi(V) - 2^(9, x ST)) 

T' er <(po./n) Jnxn 

This is clearly equivalent to the Lagrangian formulation ( 6 ). □ 


The interest of this result is that it allows a dynamic formulation of the partial 
optimal transport problem. Indeed, it is shown in [ R13] that minimizing (7) is 
equivalent to minimizing the following dynamic functional 

rl r r \u(t,x)\ 2 


j J 

J o Un 


-dx + 5 A 


[ |C(i,x)|dx 
Jn J 


/n P(t,x) 

subject to the continuity constraints with a source (5). 


d t 


( 8 ) 
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The case W±. When chosing p = 1 in (7), one obtains another partial optimal 
transport problem, with | • | as a ground metric. This distance was already 
introduced by [ lan92] in view of the study of Lipschitz spaces. Without providing 
a proof, it is quite clear that this distance is also obtained by minimizing 


f \u(t,x)\dx + 5 ( \((t,x)\dx 

) IJn Jn 


d t 


(9) 


over the set set of solutions to (5). 


A unifying framework: homogeneity and convexity. We conclude this 
bibliographical section by suggesting a unifying framework for a class of general¬ 
izations of the W p metrics. Consider the problem of minimizing 


rv-t 

J 0 IP Jn P‘ 


1 f ICI 9 

dx + s p - I -im dx 


jp- 1 


Q Jn P 


,<7-1 


d t 


over the set of solutions to (5). The functional penalizes the p -th power of the 
L p p norm of the velocity field and the q -th power of the L q p norm of the rate 
of growth, the latter being reparametrization invariant. Whatever p,q > 1, 
the above functional can be rigorously defined as a homogeneous, convex, l.s.c 
functional on measures (as in Section 2). Also, this problem can be seen as the 
dual problem of maximizing the variation of an energy 

/ V?(V)Pi - / ¥>(0,OA) 

Jq Jq 


over continuously differentiable potentials cp on [0,1] x ft satisfying 

d t ip + ^Av¥>| p/(p_1) + i^l?/(9-i) < o 

P <7 


with some adaptations when p or q takes the value 1 or +oo. 

For p — q — 1, we recover (9) and for p — 2 and q — 1, we recover (8), 
implying the connections to partial optimal transport arising from our previous 
remarks. While there is an interesting decoupling effect between the growth term 
and the transport term when p or q is equal to 1, one should choose in general 
p = q > 1 so as to define (the p -th power of) a metric. In this article, we focus 
on the case p = 2, q = 2, which is the only Riemannian-like metric of this class, 
namely our interpolated distance between W 2 and Fisher-Rao. 


1.4 Relation with [ 1V1 ] 

After finalizing this paper, we became aware of the recent work of [ VIV 15] 
which defines and studies the same metric. These two works were done simulta¬ 
neously and independently. The approach of [ ^MV15] is however quite different. 
While both works prove existence of geodesics, the proofs rely on different strate¬ 
gies, and in particular our construction is based on convex analysis. The use of 
tools from convex analysis allows us to prove a useful sufficient condition for the 
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uniqueness of a geodesic. We study the limiting models (generalized transport 
and Fisher-Rao), through the introduction of the 5 parameter. We prove exis¬ 
tence and uniqueness of travelling Diracs solutions, while [ V15] provides an 

informal discussion. Lastly, we detail a numerical scheme and show applications 
to image interpolation. Let us stress that [ dV15] provides many other contri¬ 

butions (such as a Riemannian calculus, topological properties and the definition 
of gradient flows) that we do not cover here. 

1.5 Notations and Preliminaries 

In the sequel, we consider an open bounded convex spatial domain D C 
d G N*. The set of Radon measures (resp. positive Radon measures) on X (X is 
typically D or [0,1] x D), is denoted by M{X) (resp. Af + (X)). Endowed with 
the total variation norm 


MOO d = sup | J Lpdfi : y € C{X ), l^loo = 1 

M(X) is a Banach space. We identify this space with the topological dual of 
the space of real valued continuous functions C(X) endowed with the sup norm 
topology. Another useful topology on M(X) is the weak* topology arising from 
this duality: a sequence of measures (fi n )neN weak* converges towards (i G M(X) 
if and only if for all ip G C(X) : lim n ^ +00 f x (pdfi n = f x Lpdfi. Any bounded sub¬ 
set of M{X) (for the total variation) is relatively sequentially compact for the 
weak* topology. 

We also use the following notations: 

• fi <C v means that the measure \i is absolutely continuous w.r.t. v\ 

• for a (possibly vector) measure (i , \fi\ G A^+(X) is its variation; 

• for a positively homogeneous function /, we denote by f(/i) the measure 
defined by f(fi)(A) = f A f(d/i/d\)d\ for any measurable set A, with A satisfying 
\/i\ <C A. The homogeneity assumption guarantees that this definition does not 
depend on A; 

• T#/i is the image measure of fi through the measurable map T : X\ —» X 2 , 
also called the pushforward measure. It is given by T#/i(A. 2 ) — fi{T~ l (A 2 )); 

• 5 X is a Dirac measure of mass 1 located at the point x ; 

• ic is the (convex) indicator function of a convex set C which takes the value 
0 on C and +00 everywhere else; 

• if / is a convex function on a normed vector space E with values in 
] — 00 , + 00 ], /* is its dual function in the sense of Fenchel-Legendre duality, i.e. 
for y in the topological dual (or pre-dual) space E' : f*(y) — sup xeE (x , y) — f{pc). 
The subdifferential of / is denoted df and is the set valued map x i-G {y G E' : 
Mx’ e E, f{x') - f(x) > {y,x' - z)}. 

Finally, when fi is a measure on [0,1] x D which time marginal is absolutely 
continuous with respect to the Lebesgue measure on [0,1], we denote by (fit)te[ 0 , 1 ] 
the (dt-a.e. uniquely determined) family of measures satisfying fi — / 0 1 (/^t®^t)dt, 
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where 0 denotes the product of measures. This is a consequence of the disinte¬ 
gration theorem (see for instance [AGS08, Theorem 5.3.1]) which we frequently 
use without explicitly mentioning it. In order to alleviate notations, we replace 
the integral expression of fi by fi — fit 0 dt. 


2 Definition and Existence of Geodesics 

In order to prove that the minimum of the variational problem is attained, rather 
than using the direct method of calculus of variation, we choose to take advantage 
of the convex nature of our problem and prove it by Fenchel-Rockafellar duality. 
Indeed, we show that (a generalization of) problem (4) is obtained by considering 
the dual of a variational problem on the space of differentiable functions. This 
point of view allows to kill two birds with one shot: proving the existence of 
geodesics and exhibiting sufficient optimality conditions. 


2.1 Definition of the Interpolating Distance 

Consider the closed convex set 

B s d = | (a, b, c) € R x R d x R : a + I ^ \b \ 2 + |J) < 0 
and its convex indicator function 


L B S 


(a, b, c) € R d+2 ^ 



if (a, b, c ) G Bs 
otherwise. 


We denote by f$ the Fenchel-Legendre conjugate of lb 5 , he. 


f \y\ 2 +8 2 z 2 

/«5 : (x, ?/, z) E M x K d xK4 <0 

I +OC 


if x > 0, 

if («, \y\,z) 

otherwise 


( 0 , 0 , 0 ) 


which is by construction proper, convex, lower semicontinuous (l.s.c.) and ho¬ 
mogeneous. One can check by direct computations that 


dfs(x,y,z ) = < 




2a; 2 


Bs 


-,«,&) ifi > 0, 

if (x, \y\,z ) = (0,0,0), 
otherwise. 


( 10 ) 


We denote M = M{[ 0,1] x Q) and for fi — (p, cj, () G M x M d x M we define 
the convex functional 



where A is any non-negative Borel measure satisfying |/i| <C A. As fs is ho¬ 
mogeneous, the definition of D$ does not dependent of the reference measure 
A. 





Definition 1 (Continuity equation with source). We denote by C£q(po, pi) the 
affine subset of M x A4 d x M of triplets of measures p — (p,uj,() satisfying 
the continuity equation dtp + V • uo = £ weakly, interpolating between po and pi 
and satisfying homogeneous Neumann boundary conditions. This is equivalent 
to requiring 


[ [ d t (fd p+ [ [ V(p*dcj + f [ ip d( = [ <p(l,-)dpi-/ (p(0,-)dp 0 (H) 
Jo Jn Jo Jn Jo Jn Jn Jn 


for all (p G C 1 ([0,1] x Q). 


Remarks. 


• The set C£q(po, Pi) is not empty: it contains for instance the linear inter¬ 
polation (p, 0, dtp) with p = (tp\ + (1 — £)po) 0 d t. Moreover, if we assume 
that po and pi have equal mass, then there exists (p, cj,£) G C£q(po, pi) 
such that C = 0. This is a consequence of [AGS08, Theorem 8.3.1]. 

• If we make the additional assumption that a; < p and ( < p (which is 
satisfied as soon as D$(p) is finite), we find that the time marginal of p is 
absolutely continuous w.r.t. the Lebesgue measure, allowing to disintegrate 
it in time. 

• Under the same assumptions, p admits a continuous representative, i.e. it 
is dt— a.e. equal to a curve which is weak* continuous in time. 

• When p is disintegrate in time, the map t i—pt(fl) = f^Pt admits the 

distributional derivative Pt(fi) = f° r almost every t G [0, 1]. Indeed, 

taking a test function (p constant in space in (11) gives 

[ <p'(£)pt(ft)dt + [ <p(t)Ct(fydt = <p(l)pi(0) - (p(O)po(fl). 
jo jo 


We are now in position to define the central object of this paper as a 

map Al+(0) x Al+(£1) -G M + . 

Definition 2 (Interpolating distance). For po and pi in .A/f+(fl) x Al+(£1), the 
(squared) interpolating distance is defined as 

WF|(po,Pi) d = inf D s (fi). (V M ) 

The following result shows that WF^po, pi) is always finite. 

Proposition 2.1 (Bound on the distance). Let po and pi be in Al+(£1). The 
following bounds hold: 

WFp Po , Pl ) < 2 < 5 2 |( v VV- v^) 2 |(0) 

< 28 2 (po(fl) + pi(fi)). 
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Proof. Consider the triplet of measures p, = (p,o;,£) with 


' P = {ty/pi + (1 - t)^f <g> d t , 

< uo — 0 , 

S = 2(y/pi - VPo){tyfpi + (1 - t)y/po) ® di, 

and notice that it belongs to C£q(pq, pi)- Taking A € A4+ such that //, <C A, the 
first bound comes from 



and does not depend on A by homogeneity, while the second bound is obtained 
in the worst case when the supports are disjoint. □ 

Remark. Notice that the first bound is tight and is actually the Fisher-Rao metric, 
defined in Definition 6. It is the solution of (Vm) when adding the constraint 
uo = 0 (see Theorem 3.1). 

2.2 Existence of Geodesics 

We now prove the existence of geodesics. The proof mainly uses the Fenchel- 
Rockafellar duality theorem—which can be found in its canonical form along 
with its proof in [ /il03, Theorem 1.9]—and a duality result for integrals of convex 
functionals. 

Definition 3 (Geodesics). Geodesics are measures p G jW+ for which there 
exists a momentum uo and a source £ such that (p, cj, Q is a minimizer of (Vm)- 

Definition 4 (Primal problem). We introduce a variational problem on the space 
of differentiable functions which is defined as follows 

inf _ J(ip) d = [ f i Bs (d t <p,'V<p,<p)+ [ <p(0, -)dp 0 — f <p(l,-)dpi. 

Jo Jn Jn Jn 

(TV) 

Theorem 2.1 (Strong duality and existence of geodesics). Let po an d pi in 
A4+(D). R holds 

WFs(po,Pi) 2 = ~ inf _ J(ip) 

y.eC'MMxQ) 

and the infimum in (Vm) is attained. 

Proof. First remark that WFs(po, pi) is necessarily finite from Proposition 2.1. 
Let us rewrite (fPc 1 ) as 


inf 

tpec 1 ([o,i]xQ) 


F(A^) + G(p) 
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where 


A : ip i ^ (d t (p,Vtp,<p ), 

F : (a, (3, 7 ) i->- / / iB 6 (o;(i,a;),/ 3 (i,a:), 7 (i,a:))dxdt, 

JO JO 

and G:(p\-> <p(0,-)dp 0 - / <p(l,-)dpi. 

jo jo 


Remark that T and G are convex, proper and lower-semicontinuous. It is easy to 
find a function <£> G C l {[ 0, 1 ] x O) such that F is continuous at Aip, taking A(p(t, x) 
in the interior of the closed set B$ for all (£, x) is enough. The Fenchel-Rockafellar 
duality theorem garantees the following equality 


inf _ jUp)= max {— F*(fi) — G*(— A*(/jl))} . ( 12 ) 

(/ u)eMxM d xM 

In the second term, we recognize the continuity constraint 


G*(— A*/i) = sup 

(pec 1 ^ o,i]xO) 


-(^,/x)+ / <^(l,-)dpi- / ^(0, -)dpo 

JO JO 


0 if fj, € C£j(p 0 ,Pi) 
+oo otherwise. 


For the first term, as the Fenchel conjugate of lb 8 is /$, we can apply [ foc71, 
Theorem 5]. This theorem gives an integral representation for the conjugate of 

integral functionals by means of the recession function /°°, defined as f°°(z ) d =‘ 
sup t f(x + tz)/t, for x such that f(x) < +oo. More precisely, 



where ££ is the Lebesgue measure, gr is any singular measure which dominates 
the singular part of fi and f$° is f§ itself by homogeneity. Finally we obtain 
= Ds(/x) : which allows to recognize the opposite of ( Vm ) in the right- 
hand side of (12). The fact that the minimum is attained is part of the Fenchel- 
Rockafellar theorem. □ 


The following Theorem shows that we have defined a metric and shows two 
useful formulas. 


Theorem 2.2. WF$ defines a metric on A4+(fi). Moreover, we have the equiv¬ 
alent characterizations, by changing the time range 


WFs(p 0 , Pl ) = \ inf r [ h{%) dA 

peC£5(p 0 ,pi) J[ o,t]xQ \dA 


with t > 0, and by disintegrating // in time: 

WFs(po,Pi)= mf [ { [ fs ) dA* } d t 

/iGCSq(po,pi) jo l JO \O.At y 

where, as usual, we choose (At)te[o,i] suc h ^at for all t G [0,1], fit \ t . 


(13) 


(14) 
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Remark. The difference between minimizers of (Vm) and (14) is that for the for¬ 
mer, the integrand is constant in time (as a consequence of (14)), i.e. minimizers 
of (Vm) satisfy for almost every t G [ 0 , 1 ] 

WF s (po,Pi) = Jjs(^j dA t 

while this is not necessarily the case for minimizers of (14). 

Proof. We obtain characterization (13), by remarking that if T : t r • t is 
a time scaling and p = (p,u, () <E C£b(po,pi) then p = (T#p, € 

C£q(po, pi). For the proof of (14), the arguments of [DNS09, Theorem 5.4] ap¬ 
ply readily. Let us now prove that WF$ defines a metric. By definition, it is 
non-negative and it is symmetric because the functional satisfies D$(p,uo,Cf) — 
D$(p , — —Q so time can be reversed leaving D§ unchanged. Finally, the trian¬ 
gle inequality comes from characterization (14) and the fact that the continuity 
equation is stable by concatenation in time i.e. for r G]0, 1[, if p\ G C£o(po,p r ) 
and p 2 G CS\{p T ,p{) then Mil[o,r] + P>2^[ t ,i] G C£q(pq,pi). □ 

Proposition 2.2 (Covariance by mass rescaling). Let a > 0. If p is a geodesic 
for WFs(po, p±) then ap is a geodesic for WFs(apo^api) and 

WF s (ap 0 ,api) = ^aWF s (p 0 , Pi) 

Proof. It is a direct consequence of the homogeneity of Dy □ 

2.3 Sufficient Optimality and Uniqueness Conditions 

We now leverage tools from convex analysis in order to provide a useful condition 
ensuring uniqueness of a geodesic. We use this condition to study travelling 
Diracs in Section 4.2. 

Lemma 2.1. The subdifferential of D§ at p = (p, uj,C) such that D$(p) < +oo 
contains the set 

d c D s (p ) = | (a, /?, 7 ) G C([0, 1 ]xQ;B s ) :a + ^ (|^| 2 + = 0 - p a.e., 

/3p — (jj and yp = 5 2 £ j. (15) 

Remark. This set is exactly the set of continuous functions of the form (cq/3, 7 ) 
which take their values in df$ (defined in (10)) for all (t,x) G [0,1] x D. Note 
that d c D$ can be empty. 

Proof. Take p, G M. d+ 2 and choose p 1 - G M.+ such that p and p 1 - are mutually 
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singular and A = p + p ± satisfies |//,| + \fi\ -C A. We have 



= ((«,/?, 7), A “ M) 


and inequality (*) holds for every /2 if for all (t,x) € [0,1] x fl, (a,/3, ^)(t,x) € 
dfs ( ^(£, : this comes from the definition of the subdifferential and from the 
fact the fs = Lg § . □ 

Theorem 2.3 (Sufficient optimality and uniqueness condition). Let (po>Pi) £ 
If p = (p, cj,£) £ a77, d exists <p G C 1 ([0,T] x 0) sac/i 

that 

{dtp, V<p,<p) G d c D s (p) 

then p is the unique geodesic for WFs(po , pi). From now on, we will refer to such 
a <p as an optimality certificate. 

Proof. Consider another triplet p G C£o(po 5 pi)- We have, from the demonstra¬ 
tion of the previous lemma 

As(p) - Aj(p) > V<p, ip),p- p) = 0 


showing that p is a minimizer of Dy For the proof of uniqueness, let us consider 
another minimizer p = (p,&,£) G C£o(po 5 pi)- It holds 


D s (p)> [ [ d t (fdp+ [ [ V(pdo;+ [ [ 
Jo Jn Jo Jn Jo Jn 


= / <p(V)dpi- / <p(0, -)dp 0 
J Q JQ 

= As (A) 


(16) 

(17) 

(18) 


where we used successively: duality, the fact that A € C£g(po>Pi) and the fact 
that the primal-dual gap vanishes at optimality. So, the first equality is an 
equality, and hence A a.e., (dt<p,V<p,<p)(t,x) £ dfs (h§(^, x )) ^ or ^ suc ^ 

/i « A. And thus, from the characterization of dfs , we have 


p= (p, (V<p)p,<pp). 

But p G C^o(pojPi) and thus cr = (p CT , a/ 7 , ( a ) d =' p — p G C£q(0,0). Recall (see 
the remarks after Definition 1) that the map t pf(Ll) admits the distributional 
derivative ( f (D), for almost every t G [0,1]. As a consequence, 

pW(t) < ll<^Hoo^(0)(t) 
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since <p is continuous on the compact domain ft. But p CT (Jl)(0) = 0 thus p a = 0 
and p — p. Consequently p — p. 

□ 

Remark (Relaxation). The above condition is not necessary: the optimality cer¬ 
tificate (p is sometimes not as smooth as required by this condition. It is an 
interesting problem to study the space in which the infimum of (fPc 1 ) is at¬ 
tained, as it would give a necessary optimality condition. For instance, in the 
case of dynamical optimal transport, this space is that of Lipschitz functions 
[ fimO£]. In a different setting, for a class of distance which interpolates between 
optimal transport and Sobolev, [ PN13] proves that the optimum is attained 
in BV Pi L°°, where BV is the set of functions with bounded variation. This 
question is non-trivial and beyond the scope of the present paper. 


3 Interpolation Properties 

We have considered 6 as a constant so far, in this section we study the effect 
of varying d for a given problem. This allows us to explain to which extent our 
metric really interpolates between optimal transport and Fisher-Rao. 

3.1 Change of Scale 

Intuitively, the smaller 5, the finer the scale at which mass creation and removal 
intervene to make initial and final measures match. Accordingly, we show in 
the following proposition that changing 6 amounts to changing the scale of the 
problem. 

Proposition 3.1 (Space rescaling). Let T : (t,x) E R x R d (t,sx) be the 
spatial scaling by a factor s > 0, and S e]0, +oo[. If p = (p, c <;, () is a minimizing 
triplet for the distance WF$(po, pi), then p = (T#p, sT^uo, T#£) is a minimizing 
triplet for the distance WF s s(T#po,T#pi). 

Proof. It is easy to check that p = (p, a), £) satisfies the continuity equation with 
Neumann boundary conditions on T(ft) between T#pa and T#p#. Also, we can 
write 

Dssi £)=/ f sS (^)d\ 

Jt([ o,i]xfi) \d\y 

where A = T#A and A is any non-negative Borel measure satisfying \p\ <C A. By 
denoting = (p A , uo x , ( x ) we have (t, x) — (p A (t, x/s), scj x (t, x/s), ( x (t , x/s)). 
Thus by the change of variables formula, we obtain 

D s s(p) = [ f sS (p X ,suj x , C A )dA 

*/ [0,1] x O v ' 

= s 2 D s (p). 

This shows that if p minimizes D$ then p minimizes D s $ and reciprocally, hence 
Proposition 3.1. □ 
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Remark. The proof of Proposition 3.1 holds true as soon as f§ is of the form 
fi(p,uj,() + $ p f2(pX) where fi is p-homogeneous in uj and is independent of 
uj. Hence the choice of 5 P as an interpolation factor when introducing the various 
models in Section 1.3: so that S can be interpreted as a spatial scale. 

The following result is closely related and will be used later to prove unique¬ 
ness of geodesics between remote Diracs. 

Proposition 3.2 (Monotonicity of the distance w.r.t. rescaling). Using the same 
notations as in the previous result, if s < 1, then Ds(p) < D$(p) and thus 
WFs(T#po,T#pi) < WFs(po, pi)- Moreover, if s < 1 and uj ^ 0 these inequalities 
are strict. 

Proof. We use the same change of variables as in the proof of Proposition 3.1 
and remark that f§ is strictly increasing w.r.t. its second variable. □ 


3.2 Two variational problems on non-negative measures 

The main Theorem of section 3.3 states that the geodesics of the quadratic 
Wasserstein and the Fisher-Rao metrics are both recovered (in a suitable sense) 
when letting the parameter S go to +oo and 0, respectively. We start by defining 
two variational problems, before showing their connections to WF$ in the next 
subsection. 

Let us introduce 

D B b ■■ (p,u) D s (p,uj, 0) and D FR : (p, C) ^D s (p, 0, (). 

Those real valued functionals represent respectively the Benamou-Brenier and 
the Fisher-Rao terms in the interpolating functional D$. They do not depend on 
S and are defined such that Ds(p,uj,() — Dbb(p^) + 5 2 Dfr(p, (). 

Definition 5 (Generalized Benamou-Brenier transport problem). Given (po,pi) 
E .Ad+(fi), we introduce the time dependent rate of growth 


9 ■ (t,x) 



if po(S2) = pi(fi) 
otherwise 


(19) 


with to = a/ p o(D) ^y // po(D) — \J pi(0)j . The generalized Benamou-Brenier 
transport problem is defined as 

inf D B b(p,u) (20) 

p, w 

subject to (p, uj, gp ) G C£l(p 0 ,pi) . (21) 


and we denote by d g BB(po , Pi) the square-root of the infimum. 

Notice that d g BB belongs to the class of models for unbalanced transport of 
[LM13] which prescribe the source term ( as a function of p. In order to picture 
what is happening, consider (po?Pi) £ A1+(D) such that p o(D) < pi(Q): in that 


15 





case, t 0 <0 and the rate of growth g is positive, constant in space and decreasing 
in time. While d g BB does not define a metric as it does not satisfy the triangle 
inequality, we prove below that it is closely related to the quadratic Wasserstein 
metric. 


Proposition 3.3 (Relation between d g BB and W 2 ). Let po, Pi £ Al+(fl). We 
introduce po, Pi the rescaled measures which mass is the geometric mean between 
p 0 (fi) and i.e. such that for i G {0,1}, p%{Pt) = y/po(£l)pi(£l) and pi = 

&iPi for some ai > 0. It holds 

dgBB(po,Pi) = 2 ^ 2 (^ 0 , Pi) ) 

Moreover, the minimum in (20) is attained and minimizers can be built explicitly 
from the geodesics o/W 2 (po,pi). 


Proof If po — 0 or p\ — 0, the minimum is attained if and only if uj — 0 and 
thus the d g BB (po> Pi) — 0. If po(fi) = pi(fl) then g — 0 and the conclusion 
follows directly. Otherwise, we have to ^ [0,1] and we will bring ourselves back 
to the case of a standard Benamou-Brenier problem by a rescaling and a change 
of variables in time, similarly as in [ M13, Proposition 7]. Consider 


and 


s = {(Piu) : ( p,u,gp ) G C£l(po,pi), M < p] 

So = {( p,U3 ) : (p,uj, 0) G C£q(po, |w| < p] . 


Take p = (p, uf) in S. It satisfies (weakly) the ordinary differential equation 

dtptiSl) — g(t)pt($l) and so pt(fl) = Po(^) a.e. in [0,1]. We define the 

rescaling 


R \ S 1 — y So 

which is a bijection as to ^ [0,1]. Writing Dbb(R(p,oj)) yields a Benamou- 
Brenier functional with a time varying metric; we now counterbalance the latter 
by a change of variables in time. Consider the strictly increasing map s : t G 

t^5j which satisfies s(0) = 0, s(l) = 1 and s'(t) = > 0. Let t = s -1 

and introduce 


T : So ^ So 

T{p,u) = (pot(s),t'(s) • oj o t(s)) 

which is a bijection (see [AGS08, Lemma 8.1.3]). The rescaling followed by the 
change of variables in time induces a bijection T o R : S —)• So- Of course, 
the expression for s has been chosen after an analysis of its effect on the Dbb 
functional. Take (p, Co) = R(p,cj) G <So and (p, cj) —To R(p,uo) G So- All those 
measures can be disintegrated in time from the definition of S and <Sq. In order 
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to alleviate notations, we introduce fBB(pt^t) — Jq fs ^-,0j dA^, where 
X t e Ad+(fl) satisfies \(pt,ut)\ <C \t- We have 



u; s )ds = 



(t' O s (t)) 2 f BB (pt,u>t)s'(t)dt 
fBB(Pt,&t) 

s'(/) 


to r 1 f to -1 

to — 1 Jo V *0 


2 

fBB(Pt,Ut)dt 



Hence the relation 


Dbb{p,u) 


Pi(V) 

pom 


1/2 

Dbb{T o R(p,u )), 


from which we deduce d gB B{po,Pi ) = | (^§y) / ^(po, j^njpi), as the min- 
imum of the standard Benamou-Brenier problem is attained in <Sq- Finally, by 
remarking that ^(apo? &Pi) — V&W 2 (po, pi) for a > 0, we obtain the formula¬ 
tion of Proposition 3.3. □ 


Definition 6 (Fisher-Rao metric). Given (po, Pi) G At+(0), the Fisher-Rao 
distance is defined as 


d 2 FR (po,Pi) = inf D FR (p, C) 
p,C 

subject to (p, 0, C) e CEl(po , pi). 


We now prove uniqueness of Fisher-Rao geodesics and give their explicit ex¬ 
pression. This result is well known in a positive, smooth setting. 

Theorem 3.1. The geodesics for the Fisher-Rao metric are unique and have the 
explicit expression 

P = (Vpi + (1 - t)^/po) 2 ® dt. (22) 

Proof. First notice that the expression (22) is not ambiguous since it is posi¬ 
tively homogeneous as a function of (po>Pi)- The fact that dpR defines a metric 

is proven from the same arguments as in Theorem 2.2. As for the proof of unique- 

dcf 

ness of geodesics, we organize the proof as follows. Posing v —' po + pi, we will 
first show that the geodesics are absolutely continuous w.r.t. v ® dt, allowing 
to restrict the problem to trajectories which time marginals are L 1 (dz/). Using 
convex duality, we then show that (22) is a geodesic, and this allow us to ex¬ 
hibit an isometric injection (the square root) into L 2 (dz/), from which we deduce 
uniqueness. 
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i. Duality. Written explicitly, the constraint in Definition 6 is equivalent to 



o Jn 


%>dp+ [ [ <pdC= [ tp(l,-)dpi- f <p(0,-)dpo (23) 

Jo Jn Jn Jn 


for all test functions cp G C fl ([0,1] x(l). Similarly as in Theorem 2.1, we obtain 
that the problem defining the Fisher-Rao metric is the dual of 


sup 

^eCRt o,i]xfi) 


[ <p(l,-)dpi - [ 
Jn Jn 


¥>(0, -) d Po 


subject to, for all (t,x) G [0,1] x D, 


(dt<p(t,x),<p(t,x)) G B fr where B FR 


(a, b) : a + - b 2 < 0 


By the Fenchel-Rockafellar theorem, we obtain the existence of minimizers and 
standard arguments show that geodesics are constant speed and that d FR defines 
a metric on M+. 


ii. Stability of geodesics. Take /x = (p, Q a minimizer for d FR (po, pi). First 
of all, we can always construct paths of finite energy (take (22) for instance), so 
it holds D FR (p) < +oo and thus (Cp. As a consequence, we can disintegrate 

p in time. Now decompose p t w.r.t. v d =' po + Pi as p t = (r(t, •), z(t, -))z/ + 
(p^,(X). We remark that (r(£, •), z(t, -))v satisfies the constraints (23) and is 
thus a minimizer. Thus, (X = 0 because p is also a minimizer and consequently, 
p 1 - — 0 by condition (23). Thus, p — (r, z)z/0d£, with r(£, •), z(t , •) G L 1 (dz/) for 
all t G [0,1]. 


iii. Value of d FR . For the moment, assume that po = ro^, pi = r\v and that 
there exists k > 0 such that v a.e., 

^ < R) 

K — T\ 



Consider p — (p, C) where p is defined as in (22), i.e. 


p = (ty/n + (1 - t)y/ro) 2 u 0 d t 

C = 2( v / rl - - (1 - t)v^o)^ ® dt. 


This couple satisfies the constraints (23) and D FR (p) — 2 f^iy/pi ~ y/~Po ) 2 - Now 
consider 

*def. r/ 0 if ro(x)=n(x), 

* 9 l t=fe otherwise. 

with to(x) = yTo (yTo — yTi) X Under the assumption (Al), there exists e > 0 
such that for all x G D, to(x) ^ [—e, 1 + e] and thus, for all t G [0,1], <p*(t, •) is 
bounded and belongs to L l (dv). Also, for all x G D, <p*(-,x) G C 1 ([0,1]). But, 
in general, <p* ^ C fl ([0,1] x O) as it lacks regularity w.r.t. the space variable. 


18 



def def 

We introduce thus its regularized version: (p e — rf * <p*, where p e (£,x) — 
e~ d a(x/e) with a G 1/2, l/2) rf ), a > 0, f a = 1 and a even. By convexity 

of Bfr , we have for all (£, x) G [0,1] x fi, (c?t<p*, <p*)(£, x) G =>* (rf *^<p*, rf * 
(/p*)(t,x) G => (<%(p e ,(p e )(t,x) G Thus 

<*Ffl(Po,Pi) > lim / </? € (l,-)dpi - / ^ e (0,-)dpo 

= / <P*(V)dpi- / <X(0, -)dpo 

«7n ,7ft 

= 2 [ (VPi - CPo) 2 - 

Jn 

As this value is also an upper bound, this shows that, under (Al), d^^Pm Pi) = 
^ 4(v^ — V^) 2 * But this resu ^ remains true without any assumption on po 
and p\. Indeed, by introducing p\ — epo + (1 — e)pi and Pq = epi + (1 — e)po, the 
triangle inequality yields 

dFR(PoiPi) -d F R(p e nPi) < dFR(p e o,pi) < d F R(Po, Pi) + dra(Pi, Pi) 

and lim e ^o G?F^(Pi 5 Pi) = 0 because we have a vanishing upper bound. Hence 
^Fi^(Po’Pi) = ^ Iq(VPi ~ \/Po) 2 - Repeating this operation for pg -G po we have 
that df^PcnPi) = 2 Iq(VR ~ \/Po) 2 for A), Pi G VMW 

iv. Isometric injection. Finally, on the space M, v = {pG At+(£7) : p <C z'}, 
the map 

(■ Mv,dpR ) —» (L 2 (dz/), || • || 2 ) 

p — au i—)► y/2a 

is an isometric injection. As geodesics in L 2 (dz/) are unique, uniqueness holds for 
geodesics in M v and thus for geodesics in «M+(f2). □ 

The following lemma motivates the expressions for d g BB and dpR- 

Lemma 3.1 (Alternative characterizations). For all (po,Pi) G Ad+(£7) we have 

dgBB(po,Pi) 2 = , min D B b{p,u) 

(p,u)£Afr 

dFR(po, Pif = , min D FR (p, () 

(p,C)eA BB 


where 


Afr ='■ 


argmin Dfr(pX) and 
(p ) w,C)eC£j(po,Pi) 


Abb ='■ 


argmin D B b(p,u) 
hwC)eC£o(po,pi) 


( 24 ) 


Proof. What we need to show is 

■Abb = {(p, C) : (p, 0, C) G C£o(Po, Pi)} 
Afr = {(p,w) : (p,u,gp) e C£j(p 0 ,Pi)} , 
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where the rate of growth g is defined in (19). The first equality is easy because 
Dbb(p , cj) = 0 if and only if uo — 0 a.e. 

The second equality requires slightly more work. In the case when p o(£l) = 
pi(fl), Dfr is minimized if and only if £ = 0 as previously. The case p o(fl) = 0 
or pi(fl) = 0 is dealt with in in Proposition 4.2 (case a = 0). Otherwise, let 
us determine Afr by adapting the optimality condition in Theorem 2.3. The 
argument is more clear if we consider Dfr as a function of p — (p, instead 
of just (p, £) as cj is a variable of the problem. We thus introduce 

B F r d = l(a, /?, 7 ) € R x R d x R : a + y < 0 , /3 = 0 

and it can be shown (similarly as in Section 2.3) that 8Dfr(p,ojA) is equal to 

| (a, <E C([0, T] x Q;B fr ) : a + y = 0- p a.e. and jp = ( 

Adapting the sufficient optimality condition given in Theorem 2.3, it holds that 
if p G C£q(po, pi) and if there exists (p G C 1 ([0,1] x Cl) satisfying (dtp, Vp,p) G 
9D F r(p) i.e. 

dtp + \p 2 < 0 (with equality p a.e.), 

V<p = o, 

PP = C 

then p G Afr • The equality p a.e. is actually an equality d t a.e. in time as p 
has a positive mass and p is constant in space. All solutions are thus of the 
form p : (t,x) i—^ 3 ^. The integration constant to is found by integrating 
the variations of mass. As to ^ [0,1], p is well defined and is equal to g as 
introduced in (19). Now we prove that this condition is actually necessary: take 
(p,cj,C) G AfR' It holds 

s(l)Pi- / #( 0 )po = Dfr(p,w,Q 
Jn 

where we used successively: duality, the fact that (p, cj,() G C£q(po, pi) and 
the fact that the primal-dual gap vanishes at optimality. So, the first equality 
is an equality, and hence (dtg,g)(t,x) G d/i(dp/d|p|, 0, d£/d|p|)(£, x) for |p| a.e. 
(t,x) G [0,1] x fl, which implies that ( = gp. □ 


Dfr(p,u, () > 


[ fdtgp + gC=[ 

Jo Jn Jn 


3.3 Convergence of geodesics 


A rigorous treatment of the limit geodesics requires some compactness for the set 
of geodesics when S varies. That is why we need the following bounds, adapted 
from [DNS09, Proposition 3.10]. 


Lemma 3.2. Consider a triplet p = (p, cu,() G M. d+2 such that Dbb(p^) < 
+00 and Dfr(pA) < + 00 . For any non-negative Borel function rj on [0,1] x Cl 
we have 



< V^Dbb 




(25) 
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and 
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A/so ; similar bounds can be written for /it (the disintegration of /a w.r.t. time) by 
integrating solely in space. 

Proof. Take A G A4+ such that \/i\ <C A and decompose d/a as (p A ,cd A , ( x )d\. As 
Dbb(pw) < Too and Dfr(p , C) < Too, we have uo <C p and f </). It follows 



by Cauchy-Schwartz inequality on the scalar product (•, *)L 2 (dA)- Inequality (25) 
follows and (26) is derived similarly. □ 

In the following lemma, we derive a total variation bound on fi which only 
depends on Dbb(p) and Dfr(/i)- 

Lemma 3.3 (Uniform bound). Let po,Pi G A4+(i2) and M G R+. There exists 
C G R+ satisfying |/i|([0,1] x 12) < C, for all /i — (p,u;,£) G C£q(po 5 Pi) 
that Dbb(p , cj) < M and Dfr(p , C) < Af. 

Proof. We start by giving a bound on p([0,1] x ST) = p t (ft)dt. Recall (from 
the remarks after Definition 1) that the map £ i—>> pt(ft) admits the distributional 
derivative p f t (ft) = (t(D), for almost every t G [0,1]. It follows, by Lemma 3.2 

Pt(fy < icti(^) < V^DFn(pBctjpt(n) 

where DpRil^t) denotes f n ./^( TuT )dAt where A* is such that Ht <C Aj. By 
integrating in time and applying the Cauchy-Schwartz inequality 

Pt(fi) - po(fi) < VzdmpX)\L¥J] xn). 

We integrate again in time to obtain 

p([0,1] x fi) < po(fi) + v^^OvTO, 1] xfi) 

which implies that p([0,1] x ft) is bounded by a constant depending only on 
Drr(pX) and pom The conclusion follows by bounding |£|([0,1] x ft) and 
|cj|([0, 1] x ft) thanks to Lemma 3.2. □ 

Let us now recall a well known property of weighted optimization problems. 

Lemma 3.4 (Properties of weighted optimization problems). Let f and g be two 
proper l.s.c. functions on a compact set C with values in M U {Too} such that 
h — f + 5g admits a minimum in C for all S g]0, oo[. Let (S n ) ne ^ be a sequence 
in ] 0, Too[ and (x n ) ne ^ a sequence of minimizers of h n — f T S n g. We introduce 

A(f) d = argmin xeC f( x ) an d A{g) similarly. 
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1. For all n G N, x? G A(f) and x 9 G A(g) we have f(x n ) < f(x 9 ) and 

9(xn) < g(x f ), 

2. If S n —^ 0 ; then (x n ) ne ^ admits an accumulation point in argmin ^(f)9y 

3. If 5 n —» oo ; then ( x n ) ne ^ admits an accumulation point in argmin^) /. 

Proof. All those results are derived using elementary inequality manipulations. 

□ 

The following lemma gives total variation bounds on geodesics that are inde¬ 
pendent of S , which will be helpful in order to explicit the limit models. 

Lemma 3.5 (A relative compactness result). Let po and p\ in A4+(fl) and con- 

def. 

sider Q p P l — (J 5>0 argmin Dg(po, Pi)- For all p = (p,u, C) £ Qpl, the following 
bounds, independent of 8, hold 

Dbb{p , w) < d 2 gBB (p 0 , pi) , (27) 

Dfr(p, C) < dp R (po,pi). (28) 

Moreover, Q p p \ is relatively compact. 

Proof. Inequalities (27) and (28) are direct applications of Lemma 3.4-1, to the 
characterizations of d g BB and dpR given in Lemma 3.1. As these inequalities do 
not depend on 5, we can deduce from Lemma 3.3 a uniform TV bound on the 
elements of Q p p \. More explicitly, there exists C > 0 such that for all p, G Q p p \, 

H([o, i] x n) < c 

This implies that Q p p \ is relatively compact for the weak* topology. □ 

We can now state the main result of this section. 

Theorem 3.2 (Limit models). Let po and pi in 7W+(fl) and let (p n ,uj n A n )neN 
be a sequence of minimizers for the distance WF$ n between po and p\ with (S n ) n G 

A. If S n —» Too then (p n , uo n , C n )neN weak* converges (up to a subsequence) 

n —^oo 

to a minimizer for d g B b • 

B. If S n —» 0 then (p n ,cj n A n )neN weak* converges (up to a subsequence) 

n—yoo 

to a minimizer for dpR. 

Proof. Let us review the hypothesis of Lemma 3.4 . First, Drb and Drr are 
l.s.c. The existence of minimizers for D§ has been shown in Theorem 2.1 for any 
positive value of 5. Moreover, as a consequence of Lemma 3.5, those minimizers 
are in a compact set—-just take any compact set which contains the closure of 
the relatively compact set of all minimizers. All conditions are gathered for 
Lemma 3.4-(2,3) to be applied, hence the result. □ 
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4 Explicit Geodesics 


In this section, we give an explicit description of the behavior of geodesics in 
three cases: for an “inflating” measure, for the transport of one Dirac to another, 
and finally for the transport of multiple couples of Diracs. In order to establish 
explicit geodesics, we will first exhibit an ansatz and then prove the existence of 
an optimality certificate as defined in Theorem 2.3. 


4.1 No Transport Case: Inflating and Deflating Measures 

The case when there is no transport is dealt with in the following propositions. 

Proposition 4.1 (A uniqueness result). Let (po,pi) £ M+(D) 2 . //(p, 0,pp) is 
a minimizer of (Vm)> where g is the homogeneous rate of growth defined in (19), 
then p is the unique geodesic. 

Proof. Assuming (p, 0,pp) minimizes (Pm), we have ITF 2 (po, pi) = 5 2 Dfr(p, (). 
Moreover, (p,0,gp) G argmin MeC£ i (p0)Pi) D FR (p, C). Thus for any fi = (p,u, C) G 

C£o(po, Pi), Ds(p) = Dbb{p,&>) + 5 2 Dfr{p, C) > Dbb{p,u) + WFg(po, pi). This 
proves that for all minimizers, uo = 0 and thus there is a unique geodesic which 
is the Fisher-Rao geodesic (see Theorem 3.1). □ 

Proposition 4.2 (No transport case). If pi = <apo with a > 0 ; then 

p = (ty/a + (1 - t)) 2 po 0 dt 

is the unique geodesic for WF§, for all S > 0 ; and 

WF$(po, apo) = 8\y/a - 1 |\/ 2 po(^)- 


Proof. If a > 0, consider the function defined on [0,1] x by 


<p(£,x) 


2£ 2 U/T-1) 

{tyjo. + (1 — t)) 


and write the set d c D$(g ) : 


d c D$(g) = |(cr, /?, 7 ) G C([0, l]xfi;Bj) : a + = 0 -p a.e. (5 = 0 

25 2 (^-l) 


and 7 = 


{tyfcx. + (1 — t)) 


— p a.e. 


—p a.e. 
}. (29) 


We can check that (dt<p, V<p, <p) E d c D$(p J ) because, in particular, V<p = 0 and 
dt<p + (p 2 /(25 2 ) = 0. Then, as (p, ca,£) £ C£o(po,pi) the ansatz is a geodesic. 
The distance is found by computing Dg(ja). 


Now if a = 0 the certificate <p we built above is not defined at t = 1 and this 
approach is fruitless. However, by the triangular inequality, for any a > 0, 

W^(poj <*Po) < AT^po, 0) + WF S (0, apo). 
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By Proposition 2.1 we have, WFs(0,apo) < 8y/2apo(Fl). We rearrange the in¬ 
equalities to obtain 

S\y/a — l\y/2po(Q) — 8^2apo(Fl) < WFs(po , 0) < 8y/2po(Fl). 

As this inequality holds for all a > 0 , we obtain WFs(po,0) = 5y/2po{Fl) which 
is the value of the functional evaluated at the ansatz /i for a = 0 , so the proof 
that p is a geodesic is complete. For uniqueness, it only remains to remark that 
dtp — gp (with g defined in (19)) and then to apply Proposition 4.1. □ 


4.2 Transport of One Dirac to Another 


We first solve the case of the geodesic between two Diracs and then generalize this 
result to configurations with more Diracs. In the case of two Diracs, we show 
that if the Diracs are closer than i rfi, then the unique geodesic is a travelling 
Dirac. Beyond that distance, the Fisher-Rao geodesic is the unique geodesic. 


Theorem 4.1. Consider two Diracs of mass ho and h\ and location xq and x\ 
respectively, i.e. po = ho8 Xo and pi = h\8 Xl . We distinguish 3 types of behaviors 
for geodesics of WF$: 

1. Travelling Dirac. If \xi~xo\ < tt 8 then the travelling Dirac p = h(t)8 x ^<g)dt 
implicitly defined by 

j h(t) — At 2 — 2 Bt + ho 
\^h(t)x'(t) = cjq 

is the unique geodesic. Here uo o ; A and B are constants explicitly depending 
on bn, h\ and xi — xo (the relations are given in the proof). 

2. Cut Locus. If \xi—xq\ = 7 t 8 then there are infinitely many geodesics. Some 
of them can be described in the following way: take N G N and choose two 
N-tuples (/io)i=i,...,iV an d {h l i)i=i,...,N of non-negative real numbers satisfy¬ 
ing ho = ^2^ h l 0 and hi = YaLi^i- A geodesic is given by p — J2iLi Pi 
where pi is either the travelling Dirac of point 1. or a Fisher-Rao geodesic 
between Pq8 Xo and p\8 Xl . Notice that a single travelling Dirac or the Fisher- 
Rao geodesics are particular cases. 


3. No transport. If \xi — xq\ > n8 then the Fisher-Rao geodesic 

p = [t 2 hi8 xi + (1 - t) 2 h 0 8 xo \ <g> d t 
is the unique geodesic. 

Corollary 4.1. If \xi — xo\ < tt 8 and if Ft Cl, then the distance is (by formula 

(3i); 


(^'O^c(O) 5 ^l^x(l)) 




ho + hi — 2\[hohi cos 



V 28 \^h[e iX1 /^ - y/^e^°/( 2(5 )| . 


This induces a local isometric injection from the space of Dirac measures on 
]0, 8n[ to C equipped with the flat Euclidean metric. 
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Remark. If ho — hi = h and \x\ — xq\ < ttS then the minimum mass of the 
geodesic is attained at t — 1/2 and its value is 



which is never smaller than h/2. 

Proof. The proof is divided into 4 parts, in the first three parts, D is a segment in 
R. First we derive an ansatz for the geodesics when \x\ — xo\ < nS, and write the 
sufficient optimality conditions. Then comes a part with technical computations 
where we prove the existence of an optimality certificate. In a third part we 
study what happens at the cut locus and beyond. Finally, we extend the proof 
for D E R rf , for any dimension d. 

i. Ansatz and optimality conditions In this part, we consider the case 
where \x\ — xq\ < nd and D is an open segment of R. Let us look for geodesics 
of the form of travelling Diracs, i.e 


P = ® d t 


(30) 


where h and x are functions from R to R, assumed sufficiently smooth so that 
all oncoming expressions make sense. Moreover, we assume h$,hi ^ 0 and 
x\ xo (those special cases are all dealt with in Proposition 4.2). Satisfying 
the continuity constraint imposes uo = h(t)x'(t)5 x ^ 0 d t and ( = h'(t)S x ^ 0 d£, 
so the functional reads 



The Euler-Lagrange conditions imply that minimizers among travelling Diracs 
should satisfy, for some cjq E R, 


h(t)x'(t ) = c^o 
2 h"(t)h(t) — h'(t ) 2 = CcJq / (5 2 
h( 0 ),/i(l), x( 0 ), x(l) fixed. 


Solutions of the second order differential equation are of the form h{t) = At 2 
2 Bt + h 0 where A and B are given by the system 



A — 2 B = h\ — Kq 


This is a second order equation which admits two solutions 



and 
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where e E { — 1, +1}. In order to choose the value for e, we plug the expressions 
in the functional 


Ds(p,u, C) = 

Jo 


1 1 ljq + 5 2 h'(ty 


h(t) 


d t = 2£ 2 ^4 


(31) 


and conclude that e — +1 as otherwise, the functional is greater than the Fisher- 
Rao upper bound given in Proposition 2.1. Also, (31) shows that A > 0 since 
WFs(po, pi) > 0. In order to determine cjq, we integrate the speed to obtain 


X\ — Xq 
UJQ 


r 1 dt 26 r? 

Jo h(t) LJo J_ a i 


^ d^ - = — [arctan(/3) + arctan(cr)] 


+ u z 


LU 0 


with 


def - D A a def - 25 t A T 3\ 

a = —B and p = —( A — B). 

6J 0 CJO 


(32) 


The expression for the sum of arctan depends on the sign of 1 — a(3. Yet we find 


U 2 


1 - 0/3 = ~ w 


which is positive since A > 0. So we are in the case 

QL ~\~ j3 


arctan a + arctan /? = arctan 


1 — a/3 


e [—7r/2, 7 t/2] 


and solutions exist if and only if \x\ — cco| < We introduce r d =' pp and 
by direct computations 


cjq = 2 Sr 


h 0 hi 


and r = tan 


Xi - x 0 


1 + r 2 ^ * V 2 5 

Notice that we can rewrite A and B in a simpler way with the new parameter r 


A — h\ -p ho — 2 


h 0 hi 


1 + T 2 

making it clear that A and B are well defined. 


> 0 and B — ho — 


hphi 
1 + T 2 


ii. Existence of a certificate So far, we have shown that travelling Diracs 
of the form (30) cannot be geodesics if \x\ — xp\ > tvS. Otherwise, there is 
a unique candidate solution and this part is devoted to the construction of an 
optimality certificate in order to show that it is optimal over all measure valued 
trajectories and not only among travelling Diracs. For simplicity, let us take 
5=1 from now on, without loss of generality thanks to Proposition 3.1. By 
Theorem 2.3, ip E C l {\ 0, ll x Q) is an optimality (and uniqueness) certificate if 
for all (£, x) E [0,1] xfi 

dtp + - (( d x p ) 2 + p 2 ) < 0 (33) 
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and on the support of p : 


d x ip(x(t),t) = x'{t) 

(34a) 

... /z / (i) 

»’ (l(t) ' f)= h(t) 

(34b) 

dtp + 2 {(d x p) 2 + T^ 2 ) = 0- 

(34c) 


In view of the equations p should satisfy, we suggest the following optimality 
certificate for the candidate (p, u;,£): 


<p(£, x ) = a[t) cos(x — 9) + b(t) 

Inequality (33) gives, for all (t,x) G [0,1] x O: 

(a!(t) + a(t)b{t )) cos(x — 0) + ^f/(£) + -(a 2 (£) + 5 2 (£))^ < 0 

As equality has to be reached at all points of the form (x (£),£) from condition 
(34c), the only way to satisfy this inequality is to have 


a' + ab — 0 

V + \{a 2 + b 2 ) = 0. 


The solutions to this differential system are of the form 


I a(t) = ^ 

[Ht) = tGr + I=r 2 

Now assume £i ,£2 ^ [0,1], so that a and b are (7°° on [0,1]. This assumption will 
be checked below. 

It remains to check conditions (34a) and (34b). Equation (34a) imposes 
x'(t) = — a(t) sin(x(t) — 9). We first determine x implicitly from this equation 
before checking in a later step that it indeed corresponds to u^/h. For k G Z, 
the function x : £ G [0,1] i-G 9 + kn is a stationary solution. As a is smooth, the 
Picard-Lindelof theorem guarantees the uniqueness of the solution of any initial 
value problem, so solutions do not intersect. Since x is not constant, we have 
V£ G [0,1 \,x(t) ^ {9 + kn,k G Z}, and thus sin (x(t) — 9) does not vanish in [0,1]. 
Let us divide by sin (x(t) — 9) and integrate equation (34a) 


x'(t) 


sin (x(t) — 9) 
and thus 


£ — £2 £ — £1 


1 1 - cos(aV) -6) _ 1 

2 l0g 1 + cos(x(t) — 6) g K 


t — t 2 


£ — £1 


cos(x(£) — 9) — 


K 2 {t - £i ) 2 - (£ - £2)" 


(35) 


ft 2 (£ - £i) 2 + (£ - £2) 2 
for some ft > 0. The value of the integration constant ft will be studied later on, 
but we already see that 9 is determined given t\ and £ 2 : 

,2+2 +2\ 

mod 27r 


9 = x(0) — arccos f—-f^) 
\ ft ^1 T £2 / 


27 














The last constraint we have is equality (34b): a(t)cos(x(t) — 6) + b(t) = 
After several lines of re-arranging one finds this is equivalent to 

K 2 {t - ti) + {t- t 2 ) _ At- B 
K 2 (t — t \) 2 + (t — t 2 ) 2 At 2 — 2 Bt + ho 

By identification, there exists a factor A 2 > 0 such that: 

( A — A2(ft 2 + 1 ) 

< B = A 2 (ft 2 ti + t 2 ) 

[h 0 = A 2 (ft 2 t 2 + t|). 


This can be symmetrized as 


{ A — Ai + A2 

B — \\t\ + (^6) 

h 0 = Ait 2 + A 2 t 2 . 


with Ai > 0, A 2 > 0 (we imposed Ai = \ 2 k 2 ). 

Also, note that gathering conditions (34a), (34b) and (34c) leads to the rela¬ 
tion 2 h'h — (h') 2 = ( x') 2 h 2 and thus, by the Euler-Lagrange conditions satisfied 
by h, ( x') 2 h 2 = cjq. This proves that the function x implicitly defined by (35) is 
the same as that of the ansatz. To sum up, if a solution to system (36) satisfies 
the constraints 

[Ai >0 
< A 2 >0 
1 (^ 2 ) ^[ 0 , 1] 2 


then we have built an 
candidate solutions: 


optimality certificate. System (36) leads to the following 


L l ~ ^ Ak 


*2 = 5 

X 2 — A — Ai 


where k, = y ^ and e E { — 1,1}. The sign of e depends on direction of the 
movement. We choose say, e — 1, the other case is similar. Let us look for a 
solution satisfying, e.g. t\ > 1 and t 2 < 0. This leads to the conditions 


k > a and — > /3 

K 


where a and f3 have been defined earlier in the proof in (32). It is not excluded 
that /? or a be non-positive, in that case it is easy to find k satisfying both 
inequalities. Now, if <a, (3 > 0, the necessary and sufficient condition for a solution 
k to exist, is a(3 < 1, which we have already checked. This concludes the proof 
for the case \x± — xq\ < 5n. 
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iii. The cut locus and beyond Now we study what happens when \xi~xo\ = 
7 r5, which is the cut locus distance. Introduce a Dirac h\S x ^ between hoS XQ and 
h\8 Xl where x(s) — xq + s(x\ — xo). It is not hard to check (see for instance 
equation (31) ) that 

WF$ (h()5 Xo , hi5 x ( s ^) 28 2 {hu + hi) 

v s—y 1 

and that 

WFs{hi5 x{s) MS Xl ) -> 0. 

Combining two triangle inequalities—for the upper and lower bounds—we obtain 

WF 2 (h o&ro} h\5 Xl ) — 25 2 (/io + ^i)- 

This has important consequences. First, it proves that the travelling Dirac is still 
a geodesic at the cut locus, with the trajectory p t = h(t)S x u ) where 

h(t ) = ho(t — l) 2 + h\t 2 and x{t) — 

h[t) 

since D§ evaluated at this trajectory is worth 25 2 (ho + hi). Second, as D$ 
linearly depends on the masses we can build infinitely many geodesics, examples 
are described in the theorem. Finally, the cost at the cut locus equals the upper- 
bound of pure Fisher-Rao (recall Proposition 2.1). 

Thus, by monotonicity of the distance w.r.t. rescaling (see Proposition 3.2), 
the WF$ distance between two Diracs for which \xi — xo\ > tv5 is 5^/2(ho + hi) 
and a geodesic is 

p = [h 1 t 2 6 Xl + h 0 ( 1 - t) 2 5 xo ] <g> d t. 

For uniqueness, we again apply Proposition 3.2 for proving that, when \xi — xq\ > 
7 r5, the component uo of minimizers is zero. Thus the only geodesic is the Fisher- 
Rao geodesic, by Theorem 3.1. 

iv. Extension to arbitrary dimension The extension of the previous results 
for D G d > 1 is rather straightforward and we will do it in such a way so as 
to prepare the ground for the next theorem. Recall that what we want to prove 
(and have proven for d = 1 so far) is that if \xi — xo\ < i r5, then the travelling 
Dirac (as described in the theorem) is the unique geodesic. To show this, we 
choose 6—1 without loss of generality and we introduce the function 

^•^ = (ih- 1 -Th 2 ) cos{lx - en + ih- l + Th 2 (37) 

where the constants £i, £2 and the vector 0 can be found by the same method 
than above by considering the problem on the line spanjxi — xq}. This is pos¬ 
sible because cosine is an even function and 9 G spanjxi — xq }. Now remark 
that the derivative of cos(| • |) is continuous at 0, thus ip G C' 1 ([0,T] x D) and 
by construction satisfies all the conditions to be an optimality and uniqueness 
certificate and the result is shown. The behaviors at the cut locus and beyond 
extend straightforwardly to this case. □ 
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4.3 Matching Atomic Measures 

Geodesics between one couple of Diracs are so far quite well-understood. To go 
one step further in the comprehension of the behavior of our metric, we study the 
geodesics between two purely atomic measures po and p\. We show that under 
some geometrical conditions on the locations of the atoms, the behavior is similar 
to that of a pair of atoms. Those conditions require basically that the atoms be 
arranged by close pairs of Diracs, and the pairs should be far from each other. 

Theorem 4.2. Let po = J2iLi ^ 0^4 an( ^ Pi = J2iLi ^\5 X \ b e th e initial and final 
measures, where h l Q or h\ may be zero. Also suppose that 

• Vi G {1,... A}, \x l 0 — x\\ < 7 t6, 

• V(i,j) € {1,. ..N} 2 ,i ^ j => max a;/9e{0il} {|a;* a - x^\} > Qn5. 

Then p = where for all i € p 1 — h l (t)5 x i^ ® d t is the 

geodesic between h l 0 5 x ^ and h\5 x i . from Theorem 4.1, is a geodesic for WF$. This 
geodesic is unique if all masses Kq, h\ are positive. 

Remarks. 

• A finer analysis could easily lower the “security distance” of Qn5 between 
pairs of Diracs so that they can be considered independently. 

• In figure 1 we give an example where the hypotheses of Theorem 4.2 are 
satisfied. 

• We only prove uniqueness of the geodesic when the certificate is non¬ 
degenerate. But it is likely that uniqueness hold in general under our 
hypotheses. 

Proof. Again we choose 5 — 1 without loss of generality. As for now, assume that 
all masses h l 0 ,h\ are strictly positive, the case of vanishing masses, i.e. “single” 
Diracs, will be fixed at the end of the proof. We aim at building an optimality 
certificate <p G C fl ([0,1] x Ci) for the candidate geodesic of the theorem. Hence <p 
should satisfy, for all (t,x) G [0,1] x D, 

dtv+\( iwf+<y) <0 

and for all t G [0,1], for all i G {1,..., N} : 

< ip(x l (t),t) = ( h l )'(t)/h l (t) 

^dt<p + \ (|V<p| 2 + <p 2 ) = 0 for points of the form (t,x l (t)). 

We introduce for i G {1,... A}, the certificates (p z for the geodesics between 
each couple of Diracs {h\ ) 5 x i^h\5 x P) as defined in equation (37). The three pa¬ 
rameters describing are denoted t\,t l 2 and 6 l . As 6 l is only defined up to trans¬ 
lations, we decide from now on that 9 l is the unique such translation satisfying 
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Figure 1: Configuration in dimension 2 satisfying hypotheses of Theorem 4.2. 
The initial po and final pi measures are composed of the black and red Diracs, 
respectively. At least 1 couple of spheres of radius 37 rS do not intersect for each 
pair of systems. We show that in that case, the geodesic is the sum of two 
travelling Diracs (for the pairs), an on-place decreasing Dirac (single black) and 
an on place increasing Dirac (single red). 


maxH^Q — 0 l |, \x\—6 l \} < i r, which exists under our assumption that \x l 0 —x\\ < i r. 
Thus, from the hypotheses in the theorem, we have that min^j{|6^ — 6 J: |} > 47t. 
Now consider the binding functions, defined for i G {1, ..., TV} by 

(pl(£, x) =-^ cos(|x — 0*1) H- 

’ 7 t-tl Vl 17 t-tl 


Each p l h is in phase with p l and satisfies 

<Pb(t> nu + ° l ) = T—J = ™ + ° l ) 

t t 2 


and 2i tu + 6 l ) — 0 


for all u G \u\ — 1 and t G [0,1]. Also they are solutions to the equation 
d't<p\ + \ ((V^) 2 + (ipl ) 2 ) = 0. Now, as the constant zero function is a solution 
of this p.d.e too, we introduce 


p(t,x) 


r p l (t, x) if \x — 6 l \ < 7T 

< (p l b (t, x) if \x — 6 l \ < 27r and \x — 6 l \ > i r 

^0 otherwise. 


Under the hypotheses of the theorem, only one condition is met at a time so this 
function is well defined. It belongs to C 1 ([0,1] x Cl) because the bindings are 
located where the gradients vanish, on the extrema of cosines. See Figure 2 to 
picture how the bindings are performed between the p l s. By construction, all 
hypotheses are fulfilled so that p be an optimality and uniqueness certificate. 

Now if one or more of the h l Q or h\ vanish, the same reasoning as in the proof 
of Theorem 4.1 allows to compute the distance between po and p\ (by an upper 
and lower bound, making the mass of vanishing Diracs tend to zero), and all that 
remains is to see that replacing the travelling Dirac by a Fisher-Rao geodesic for 
single Diracs reaches that cost. □ 
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Figure 2: Example of an optimality certificate for 4 pairs of Diracs, in the case 
d — 2 at fixed t. The centers of the bumps are the 6 l s. At the position (t,Xi(t)) 
of a travelling Dirac, the gradient gives its speed and the height gives its rate of 
growth. 


5 Numerical Results 

This section presents a numerical implementation of the computation of geodesics 
for our metric. The source code to reproduce these results is available online 1 . 

5.1 Discretization 

The numerical resolution of the dynamical optimal transport problem is often 
performed with first order methods using proximal splitting algorithms ([BBOO], 
[ D P01 ]). These methods extend with no difficulties to the case of transport 
with sources, as long as the proximal operator of the new functional can be 
easily computed. In order to compare a few models of optimal transport with 
source, we implemented a Douglas-Rachford proximal splitting algorithm on a 
staggered grid, in the footsteps of [ D P014]. The introduction of the source term 
comes with a few adjustments so we describe the numerical method that we 
implemented, in the case of a transport problem in 1-D for simplicity (but the 
scheme works in arbitrary dimension). 

Centered and staggered grids. We consider the space domain [0, L\ and the 
time domain [0,1]. We denote N the number of dicretization points in space 
and T the number of discretization points in time. The centered grid is a even 
discretization of the interior of the space-time domain [0, L\ x [0,1], namely 

Qc = | (a:* = L - = ~ -l<i<N,l <j<T 

and the variable discretized on the centered grid are denoted 

y = (p,^C)e£ c = (M^) 3 . 

1 https://github.com/lchizat/optimal-transport/ 
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x — L 


xBxBxBxBx 

• • • • 

xBxBxBxBx 

• • • • 

xBxBxBxBx 


x — 0 


t = 0 


t — 1 


Figure 3: (■) centered and (x, •) staggered grids. The grey rectangle is the 
space-time domain and here N = 3, T = 4. 

Remark that the boundaries are not included in the centered grid. There are as 
many staggered grids as there are dimensions, defined as 

Os = j(*i = ~ Y~ ) : 1 < * < JV + 1, 1<j<t|, 

Gl = {( Xi = L , tj = LlA) : i<i<N,l<j<T+l), 


N T 

and the variables discretized on the staggered grid are denoted by 
U = (p, u, C )es s = E 6 * x R 6 ? x mpc. 

The source £ is still discretized on the centered grid because it intervenes as is 
in the constraint (it is not differentiated). 

Remark now that the continuity constraint is defined for a variable carried 
by the staggered grid while the functional is defined for a variable carried by the 
centered grid. One way out is to keep two variables ([/, V) G £ s x £ c (one on 
each grid) and to add an interpolation constraint between them, i.e. we require 
V — I(U) where / is the midpoint interpolation operator 

t(tt\ _ (rij +1 + ri,j fhi+ij + fhij _ 

1 1'- )i,j - t 2 ’ 2 ’ J ' 

The discrete convex optimization problem that we aim to solve as an approx¬ 
imation of the continuous one reads then 

min D S (V) + l C s(U ) + t {v =x(u)}(.U, V ) 

where the three terms represent the discrete functional, the indicator of the con¬ 
tinuity constraint and the indicator of the interpolation constraint respectively. 
The discretization of the first two terms is carefully described in the next para¬ 
graphs. 


Discrete continuity constraint. The discrete continuity constraint set with 
boundary conditions is defined as 


C£ = <U = (p,a>, C) e£ s :AU = f 0 ,A = 


div — s z 


Sb 


and /o 


n 


bo 


(38) 
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where s z : U \-¥ £, selects the values of U on the boundaries of the staggered 
grids and bo is a vector containing po 5 Pi and zeros for the Neumann boundary 
conditions. The divergence operator div only acts on the components (p, uo) of 
U , takes its values on the centered grid and is defined for U G £ s as 

N 

div(JJ)ij — — (u^+iy — cuij) + T(pij +1 — pij ). 

Discrete functional. First, we use Proposition 3.1 and bring ourselves back to 
the case 5 = 1 by rescaling the space , as this allow to obtain a simpler proximal 
operator. The discrete action is defined as D(V ) =' ^ ke g c f(Pk^kXk) with 

> \ def. \^k\ 2 + Ck 

jyPk^k^k) = —^-• 

2 Pk 

5.2 Minimization Algorithm 

Proximal methods and the Douglas Rachford algorithm. A popular 
class of first order methods for solving non-smooth convex optimization problems, 
so-called proximal splitting schemes, replaces the explicit gradient descent step 
by an implicit descent step using the so-called proximal operator. For a proper, 
convex, l.s.c. function F defined on a Hilbert space T~L with values in RU +oo, 
the proximal operator is the single-valued map defined as 

Prox 7j p(x) = argmin - \\x — x \\ 2 + jF(x) . 
xen 2 

If F is smooth, the optimality condition imply that x d =' Prox 7 ^(x) should 
satisfy x — x — yVTfx), justifying the common interpretation of the proximal 
operator as an implicit gradient step. For the indicator function of a convex 
set, the proximal operator is the projection on this set. For more complicated 
functions, it might be as difficult to compute the proximal operator as solving the 
whole minimization problem. Fortunately, if the function is the sum of simpler 
terms, minimization can be performed by so-called proximal splitting methods. 
There are several ways to combine proximal operators in order to minimize the 
sum of functions. The one we implemented is the Douglas Rachford algorithm 
which allows to minimize the sum of two convex, proper, l.s.c. functions G\ and 
G 2 with the following iterative scheme: choose two initial points G V? 

and define the sequence 

w^ l+ 1) = w® + a(Prox 7 G' 1 ( 2 ^® — w^) — 0®), 
z (rn) = Prox 7G2 (V m )). 

If 0 < a < 2 and 7 > 0, one can show thant z® —>■ z* a minimizer, see [ ]P0 ]. 
Depending on which functions we take as G\ and G 2 , we obtain various algo¬ 
rithms; in our code we used 

Gi(U, V) = lcs(U) + D(V) and G 2 (U, V ) = t v =x(u)(U, V). 

For a detailed review of proximal splitting algorithms, we refer the reader to 
[CPI ]. Let us now compute the proximal operators. 
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Computing Prox 7 £>. The proximal operator of the functional D can be com¬ 
puted in closed form. The following result is a direct adaptation of [ P014]. 

Proposition 5.1. One has 


\/V <E £ c , Prox 7 £>(V) = (Prox 1 f(V k )) ke g i 
where, for all (p, li^jgKxl^xK, 


Prox 7/ (p,cD,C) 



jLC 

P* + 7’ P*+7, 

.( 0 , 0 , 0 ) 


if p* > 0, 
otherwise , 


and p* is the largest real root of the third order polynomial equation in X 
P[X] = {X-P){X + 7 ) 2 - |(|£| 2 + C 2 ) • 


Computing Proj^^. The proximal operator of ice is the projection on the 
affine set C£, which can be computed for U G £ s as 

V ce (U) = U + A*(AA*)~\b 0 -AU) (39) 

= V B (U) - (Id - I b )(s* z - div*)S _1 (s* - di v)(V B (U)) (40) 

where S = (div — s z ){Id — s^sifj ^—V — s*) is the Schur complement of AA *, is 
the identity on the boundaries on zero everywhere else and Vb is the projection 
on the boundary constraints. Given p, we can find u satisfying Su = p by solving 
A u — u +p = 0 on the centered grid with Neumann boundary conditions (on the 
staggered grids). In the Fourier domain, this equation reads 


[1 + (2 — 2cos(7 rm/N))N/L + (2 — 2 cos(7m/T))T] u[m,n\ = p[m,n]. 

The DCT-II transform (with inverse DCT-III ) which coefficients are given by 


N 


•M = E 


u\n\ cos 


n =1 


7 T , 1 X7 

N (n+ 2 )k 


is adapted to our boundary conditions. 


Compute Proj v=x ^jy The projection is given for all couples (t/o, Vo) G (£ s , £ c ) 
by 


Projy = x ([ /)(f7o, Vo) = (U* ,X(U*)) 

with U* = Q~ l {Uf) +X*Vo) and Q — Id+ 1*1. As Q is a tridiagonal matrix, LU 
factorization techniques allow to efficiently invert the system. 
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5.3 Experiments 


Transport of Gaussian bumps. Let us first explore a synthetic case where 
the initial and final measures have the same mass. This case is shown on Figure 4. 
The initial and the final measures are both composed of two Gaussian densities of 
mass 1 and 2 supported on the segment = [0,1]. The modes of the bumps are 
located at, from left to right, 0.2, 0.3, 0.65 and 0.9. The problem is discretized 
with N = 256 samples in space and T — 11 samples in time. The peculiar 
location of the Gaussian bumps allows to highlight the variations of the behavior 
of the geodesics, which is highly dependent on the metric and on the parameter 
5. For the W 2 geodesic (Figure 4a), the rightmost grey bump is split in half and 
a part of it travels to the left side of the graph, giving a interpolated measure 
which, for some applications, is not so natural. The effect of a non-homogeneous 
functional is visible on Figure 4b, where we penalized the L 2 norm of the source 
( in the functional. Non-homogeneity makes the density matter: in this case, 
the source is spread because low densities are favored. Consider now the partial 
transport geodesics: on Figure 4c, we fixed the maximum distance mass can 
travel to 25 = 0.3, while on Figure 4d, that distance is 25 = 0.15. We observe, 
as expected, that the domain = [0,1] is split in an active and an inactive set. 
Also note that as the TV cost is not strictly convex, geodesics are not unique and 
what happens in the inactive set depends on the initialization of the algorithm. 
Finally, Figures 4e and 4f display the geodesics at t — 1/2 for the WF$ metric 
with the cut locus distances set, respectively, at n5 = 0.3 and tt5 = 0.15. Note 
that there is a slight similarity between the configuration here and the hypotheses 
of Theorem 4.2, and the observed behavior can be expected if we extrapolate the 
conclusions of that Theorem. In the first case, we obtain a geodesic consisting of 
two travelling bumps which inflate or deflate. While in the second configuration, 
only one bump travels as tt5 is now smaller than the distance between the two 
bumps at the right. 

Synthetic 2D experiment. We now turn our attention to a synthetic example 
on the domain = [0, l] 2 . The initial density po is the indicator of the ring of 
center (^, |), internal diameter 0.5 and external diameter 0.7. The final density 
pi is the pushforward of p\ by a random smooth map and thus po(fl) — pi(f^). 
The domain is discretized on a centered grid of 64 x 64 samples in space and 
T — 12 samples in time. We compare on Figure 5 the geodesics for four different 
metrics: dpR , W 2 , partial optimal transport (with 25 = 0.2) and WF$ (with 
5n = 0.4). Notice that the two first rows thus show the limit geodesics for WF$ 
when 5 tends to 0 or + 00 . 

Figure 6 helps to better understand the geodesics. On the top row, the 
velocity field shows that the W 2 geodesic transports a lot of mass to the bottom 
left protuberance. On the contrary, metrics allowing local variations of mass 
(partial transport and WF§), attenuate the component of the velocity field which 
is tangential to the ring, behavior which is more consistent with the intuitive 
solution. Finally, by looking at the source maps in the bottom row, we see the 
inactive sets of partial optimal transport with well defined edges and how this 


36 



(c) Partial transport with q = 0.3 


(d) Partial transport with q = 0.15 



A ^ A 


(e) WF$ with q = 0.3 


(f) WFs with ci = 0.15 


Figure 4: Comparison of interpolations for densities on the segment [0,1]. (gray) 
Po (blue) Pi (red) p t =i/ 2 (black) u t=l / 2 (yellow) Ct=i/2- We indicate by q the 
theoretical maximum distance a Dirac can travel, which is a function of 6. 


behavior strongly differs to that of WFs geodesics. 

Biological images interpolation. Interpolating between shapes with varying 
masses is an important motivation for introducing this metric and is the object 
of our third and last numerical experiment. The initial po and pi densities are 
extracted form two segmented images of the same young brain taken at different 
times. This example is rather challenging for image matching algorithms: matter 
is creased, folded and grows unevenly. There are 89 x 73 samples in space, 12 
samples in time and the spatial domain is D = [0, 0.82] x [0,1]. We computed the 
geodesics for W 2 and for WFs with nS = 0.2. We show on Figure 7 the geodesics 
and on Figure 8 the velocity fields and the rate of growth a time t — 1/2. 

Most of the tissue growth is located at the bottom of the domain. Conse¬ 
quently, the velocity field associated to the W 2 geodesic is dominated by top to 
bottom components, as seen on Figure 8a. To the contrary, the geodesic for the 
WF§ metric manages to locally adapt the rate of growth, and the velocity field 
is far more consistent with the true evolution of the tissue. However, we retain 
some artifacts coming from optimal transport: some matter is teared off the 
brain lining and brought somewhere else to fill a need of mass. This behavior, 
inherent to the non-smooth nature of optimal transport plans, is clearly observed 
on Figure 7 near the bulges that appear on the right and left sides of the brain. 

Finally, let us recall the reader that, at optimality, the rate of growth g is 
equal to the dual variable <p and thus, the vector field on Figure 8b is the gradient 
of the rate of growth on Figure 8c, which ranges in [—0.05, 0.18]. 
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Figure 5: Geodesics between po and pi for four metrics (1st row) Fisher-Rao 
(2nd row) W 2 (3rd row) partial optimal transport (4th row) WF$ 


Conclusion 

When looking for generalizations of the dynamic optimal transport problem to 
unbalanced measures, a distance interpolating between optimal transport and 
Fisher Rao appears naturally. It is the only Riemannian-like deriving from a fam¬ 
ily of convex homogeneous dynamic minimization problems. By analyzing the 
effect of varying the interpolation factor, and by studying geodesics for atomic 
measures, we obtained theoretical clues on how the model behaves. These clues 
can be summarized by the two following interpretations: (i) the distance be¬ 
haves as a spatially localized version of optimal transport, in the sense that mass 
whose supports are far away does not interact; (ii) it also behaves as a regu¬ 
larized version of Fisher-Rao, in the sense that, unlike Fisher-Rao, it is stable 
by perturbation of the supports. This theoretical behavior is confirmed by the 
numerical experiments. 
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